library(htmltools)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 0.3.5
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(dplyr)
library(leaflet)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(ggplot2)
#Create the data link to data: https://mappingpoliceviolence.org/
police_killing_2022 <- read_csv("Mapping-Police-Violence-DIVIDED DATA.csv")
## Rows: 10763 Columns: 64
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (52): name, gender, race, victim_image, street_address, city, state, zip...
## dbl (12): age, day, month, year, wapo_id, mpv_id, fe_id, tract, hhincome_med...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
The data i use here, is the cleaned data from OpenRefine, where the date is seperated into day, month, year.
##Create the Cluster Map of victims in 2022
I’ve found the following code in Brent Thornes video: https://www.youtube.com/watch?v=dBk8gGX1MNk (09/12/2022)
police_killing_2022$longitude <- as.numeric(police_killing_2022$longitude)
police_killing_2022$latitude <- as.numeric(police_killing_2022$latitude)
Even though the coordinates in the data frame all are written numerically, R can sometimes misinterpret the data, and view it as a character. If the data we want to use to make a map is viewed as characters, the code won’t work, and the map won’t show. So in order to ensure that the data is infact in numeric, we can use the code written above.
The reason for using “$”: - to the left of $: the dataframe we want to use. the right of the $: the column from the dataset It works almost like an intermediater: direct R to where it should get the information from (but not like “<-”)
long <- police_killing_2022[[43]]
lat <- police_killing_2022[[42]]
The reason for making the longitude and latitude their own value, is that (as you will see in the code-chuck with leaflet [the map]) it makes the coding easier, thus i don’t have to plot in the coordinates separately.
Description:
long/lat = the name of the new value
<- police_killing = the place that i create my value from
[[42/43]] = the column in which the new value gets it's value/data from
police_killing_2022 %>%
filter(year == 2022)
## # A tibble: 813 × 64
## name age gender race victi…¹ day month year stree…² city state zip
## <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <chr>
## 1 Jerem… 39 Male Nati… <NA> 18 9 2022 Bingha… Adams OR <NA>
## 2 Aleja… 29 Male Hisp… https:… 18 9 2022 3200 b… San … TX 78207
## 3 Luis … 19 Male Hisp… https:… 17 9 2022 400 W … Los … CA 90003
## 4 Antho… 48 Male Unkn… <NA> 17 9 2022 4100 b… Harw… MD 20776
## 5 Colby… 39 Male White https:… 16 9 2022 inters… Weat… OK 73096
## 6 Alexi… 22 Male Hisp… https:… 16 9 2022 1100 b… Ingl… CA 90304
## 7 Westo… 35 Male White https:… 15 9 2022 7402 S… Covi… OK 73730
## 8 Timot… 29 Male White https:… 14 9 2022 Highwa… Turn… TX 75684
## 9 Sherm… 40 Male Black <NA> 13 9 2022 1500 b… Milw… WI 53208
## 10 Tysha… 25 Male Black https:… 13 9 2022 609 Ch… Rock… SC 29732
## # … with 803 more rows, 52 more variables: county <chr>,
## # agency_responsible <chr>, ori <chr>, cause_of_death <chr>,
## # circumstances <chr>, disposition_official <chr>, officer_charged <chr>,
## # news_urls <chr>, signs_of_mental_illness <chr>, allegedly_armed <chr>,
## # wapo_armed <chr>, wapo_threat_level <chr>, wapo_flee <chr>,
## # wapo_body_camera <chr>, wapo_id <dbl>, off_duty_killing <chr>,
## # geography <chr>, mpv_id <dbl>, fe_id <dbl>, encounter_type <chr>, …
Usually, when using the “filter” function, you have to surround the data in quotation marks, but because the data I want to filter is numeric, using quotation marks isn’t necessary A way to find out if the data you want to use is either numeric or character, you can simply upen the data frame (will open a new tap) and then hover the mouse over the name of the column.
leaflet() %>%
addTiles() %>%
addMarkers(lng = long,
lat = lat,
popup = paste(police_killing_2022$race,'<br>',
police_killing_2022$name, '<br>',
police_killing_2022$age, '<br>',
police_killing_2022$gender, '<br>',
police_killing_2022$city),
clusterOptions = markerClusterOptions())
## Warning in validateCoords(lng, lat, funcName): Data contains 13 rows with either
## missing or invalid lat/lon values and will be ignored
If you click on one of the clusters, it will spread out to smaller clusters, until you reach the singular marker, which here is a blue popup. On the singular popup, there is some information about the victim: the race, the name, the age, and the city it happend. Unfortunately, all the information listed above, isn’t always available - due to unknown reasons (e.g. an unknown male from Salem) Another flaw with this visualization, is that it is not possible to separate the clusters in race, so in order to figure out how many that actually gets killed, there has to be another form of visualization
Tutorial used for this visulization:
<https://rstudio.github.io/leaflet/markers.html> (used 06/01/2023)
Description of the code used [Map]: - leaflet is the package. - AddTiles () makes the map. - the lng and lat in “addMarkers” is respectively longitude and latitude. Instead of plotting in the coordinated one by one, I’ve then made a value (as you can see in the upper grey-code, which is also described above the grey code), which leads back to the longitude and latitude in my dataset. I’ve found the code clusterOptions here: https://rstudio.github.io/leaflet/markers.html
It appears that for some reason, Rstudio won’t load my map correctly, so it shows all the casualities from 2013-2022. But in my final report can you see the map as it should be seen
police_killing_2022 %>%
count(race, sort = TRUE)
## # A tibble: 8 × 2
## race n
## <chr> <int>
## 1 White 4628
## 2 Black 2703
## 3 Hispanic 1886
## 4 Unknown race 1096
## 5 Asian 154
## 6 Native American 145
## 7 <NA> 91
## 8 Native Hawaiian and Pacific Islander 60
The reasons for me to count how many victims there were in 2022, is that it is a good way to get a good overview of the numbers; in this case how many casualties there were in 2022, but also beacuse it is useful in another form of visulization:
###Number of killings, divided by race - but not Population percentage For this visualization, I’ve used some of the code we worked with during the week 48 lesson: Webscraping (What I’ve used from this, is the filtering of race, seen in line three)
illustration_1 <- police_killing_2022 %>%
select(state, year, race) %>%
filter(race %in% c("Black", "White", "Hispanic", "Native American", "Unknown race")) %>%
filter(year == "2022") %>%
count(state, race) %>%
rename(Victims ="n") %>%
ggplot(aes(x=state,
y= Victims,
fill=race))+
geom_col()+
theme(axis.text = element_text(angle = 90))
ggplotly(illustration_1)
##Killings in the US done by police divided by race, population in state
library(readxl)
us_states_code<- read_excel("NST-EST2022-POP. ALTERED DATA.xlsx")
undgå videnskabelig notation i % angivelse:
options(scipen= 999)
police_killing_2022 %>%
select(state, year, race) %>%
filter(race %in% c("Black", "White", "Hispanic", "Native American", "Unknown race")) %>%
filter(year == "2022") %>%
count(state, race)
## # A tibble: 153 × 3
## state race n
## <chr> <chr> <int>
## 1 AK Black 1
## 2 AK Unknown race 4
## 3 AK White 2
## 4 AL Black 2
## 5 AL Unknown race 4
## 6 AL White 6
## 7 AR Black 2
## 8 AR Unknown race 2
## 9 AR White 5
## 10 AZ Black 5
## # … with 143 more rows
us_states_code <- us_states_code %>%
rename(state = state_code)
The reason for this, is that in order for left-join (see below) to work, the data that need to be joined has how have the same data-name. So in this case, in the police_killing DF and the us_states DF the column containing the state abbreviations is named the same
police_killing_2022 %>%
select(state, year, race) %>%
filter(race %in% c("Black", "White", "Hispanic", "Native American", "Unknown race")) %>%
filter(year == "2022") %>%
count(state, race) %>%
left_join(us_states_code, by="state") %>%
mutate(percentage_killed = n / pop)
## # A tibble: 153 × 6
## state race n state_name pop percentage_killed
## <chr> <chr> <int> <chr> <dbl> <dbl>
## 1 AK Black 1 .Alaska 733583 0.00000136
## 2 AK Unknown race 4 .Alaska 733583 0.00000545
## 3 AK White 2 .Alaska 733583 0.00000273
## 4 AL Black 2 .Alabama 5074296 0.000000394
## 5 AL Unknown race 4 .Alabama 5074296 0.000000788
## 6 AL White 6 .Alabama 5074296 0.00000118
## 7 AR Black 2 .Arkansas 3045637 0.000000657
## 8 AR Unknown race 2 .Arkansas 3045637 0.000000657
## 9 AR White 5 .Arkansas 3045637 0.00000164
## 10 AZ Black 5 .Arizona 7359197 0.000000679
## # … with 143 more rows
A disclaimer: The total number of population in each state is repeated, so the number of pop by each race does not mean that it is the number of e.g. black population in the state. However the percentage of victims (race) in each state - is correct.
col_2 <- police_killing_2022 %>%
select(state, year, race) %>%
filter(race %in% c("Black", "White", "Hispanic", "Native American", "Unknown race")) %>%
filter(year == "2022") %>%
count(state, race) %>%
left_join(us_states_code, by="state") %>%
mutate(percentage_killed = n / pop) %>%
rename(Victims ="n") %>%
ggplot(aes(x=state,
y= percentage_killed,
fill=race))+
geom_col()+
theme(axis.text = element_text(angle = 90))
ggplotly(col_2)
## Warning: Removed 11 rows containing missing values (`position_stack()`).
police_killing_2022 %>%
select(state, year, race) %>%
count(race, sort=TRUE)
## # A tibble: 8 × 2
## race n
## <chr> <int>
## 1 White 4628
## 2 Black 2703
## 3 Hispanic 1886
## 4 Unknown race 1096
## 5 Asian 154
## 6 Native American 145
## 7 <NA> 91
## 8 Native Hawaiian and Pacific Islander 60
police_killing_2022 %>%
select(state, year, race) %>%
filter(year == "2022") %>%
filter(race == "Black") %>%
count(state, sort=TRUE)
## # A tibble: 38 × 2
## state n
## <chr> <int>
## 1 TX 22
## 2 GA 18
## 3 FL 15
## 4 NC 13
## 5 OH 12
## 6 SC 12
## 7 MO 7
## 8 NY 7
## 9 VA 7
## 10 AZ 5
## # … with 28 more rows
police_killing_2022 %>%
select(state, year, race) %>%
filter(year == "2022") %>%
filter(race == "White") %>%
count(state, sort=TRUE)
## # A tibble: 49 × 2
## state n
## <chr> <int>
## 1 CA 16
## 2 OH 16
## 3 FL 14
## 4 CO 13
## 5 TX 13
## 6 GA 10
## 7 OR 10
## 8 WA 10
## 9 PA 9
## 10 TN 9
## # … with 39 more rows
sessionInfo()
## R version 4.2.2 (2022-10-31)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.5 LTS
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/atlas/libblas.so.3.10.3
## LAPACK: /usr/lib/x86_64-linux-gnu/atlas/liblapack.so.3.10.3
##
## locale:
## [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
## [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
## [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
## [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] readxl_1.4.1 plotly_4.10.1 leaflet_2.1.1 forcats_0.5.2
## [5] stringr_1.5.0 dplyr_1.0.10 purrr_0.3.5 readr_2.1.3
## [9] tidyr_1.2.1 tibble_3.1.8 ggplot2_3.4.0 tidyverse_1.3.2
## [13] htmltools_0.5.4
##
## loaded via a namespace (and not attached):
## [1] lubridate_1.9.0 assertthat_0.2.1 digest_0.6.30
## [4] utf8_1.2.2 R6_2.5.1 cellranger_1.1.0
## [7] backports_1.4.1 reprex_2.0.2 evaluate_0.18
## [10] httr_1.4.4 pillar_1.8.1 rlang_1.0.6
## [13] lazyeval_0.2.2 googlesheets4_1.0.1 data.table_1.14.6
## [16] rstudioapi_0.14 jquerylib_0.1.4 rmarkdown_2.18
## [19] labeling_0.4.2 googledrive_2.0.0 htmlwidgets_1.5.4
## [22] bit_4.0.5 munsell_0.5.0 broom_1.0.1
## [25] compiler_4.2.2 modelr_0.1.10 xfun_0.35
## [28] pkgconfig_2.0.3 tidyselect_1.2.0 viridisLite_0.4.1
## [31] fansi_1.0.3 crayon_1.5.2 tzdb_0.3.0
## [34] dbplyr_2.2.1 withr_2.5.0 grid_4.2.2
## [37] jsonlite_1.8.4 gtable_0.3.1 lifecycle_1.0.3
## [40] DBI_1.1.3 magrittr_2.0.3 scales_1.2.1
## [43] vroom_1.6.0 cli_3.4.1 stringi_1.7.8
## [46] cachem_1.0.6 farver_2.1.1 fs_1.5.2
## [49] xml2_1.3.3 bslib_0.4.1 ellipsis_0.3.2
## [52] generics_0.1.3 vctrs_0.5.1 tools_4.2.2
## [55] bit64_4.0.5 glue_1.6.2 hms_1.1.2
## [58] crosstalk_1.2.0 parallel_4.2.2 fastmap_1.1.0
## [61] yaml_2.3.6 timechange_0.1.1 colorspace_2.0-3
## [64] gargle_1.2.1 rvest_1.0.3 knitr_1.41
## [67] haven_2.5.1 sass_0.4.4